AITopics | high-fidelity image generation

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Neural Information Processing SystemsDec-24-2025, 20:07:16 GMT

Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.

high-fidelity image generation, modulating quantized vector, name change, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.64)
Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation A Discussion on Masked Image Reconstruction

Neural Information Processing SystemsAug-17-2025, 02:07:48 GMT

This is an extension of Fig.

artificial intelligence, image generation, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Add feedback

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Neural Information Processing SystemsJan-17-2025, 23:00:11 GMT

Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.

high-fidelity image generation, modulating quantized vector, movq

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Add feedback

High-Fidelity Image Generation With Fewer Labels

Lucic, Mario, Tschannen, Michael, Ritter, Marvin, Zhai, Xiaohua, Bachem, Olivier, Gelly, Sylvain

arXiv.org Machine LearningMar-6-2019

Deep generative models are becoming a cornerstone of modern machine learning. Recent work on conditional generative adversarial networks has shown that learning complex, high-dimensional distributions over natural images is within reach. While the latest models are able to generate high-fidelity, diverse natural images at high resolution, they rely on a vast quantity of labeled data. In this work we demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform state-of-the-art (SOTA) on both unsupervised ImageNet synthesis, as well as in the conditional setting. In particular, the proposed approach is able to match the sample quality (as measured by FID) of the current state-of-the art conditional model BigGAN on ImageNet using only 10% of the labels and outperform it using 20% of the labels.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Machine Learning

1903.02271

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback